home *** CD-ROM | disk | FTP | other *** search
- # $Id: style.prm,v 1.7 1996/09/15 00:36:13 bjohnson Exp $
- # Copyright (C) 1987-1995 Verity, Inc.
- #
- # style.prm - collection schema parameters
- #
- # This file is used to enable/disable index schema features through
- # macro definitions similar to those allowed by the C preprocesser.
- # This file is included in other style files using $include so
- # that the selected features are propogated to the schemas of all
- # tables in the index. Refer to the "Using the style.prm File"
- # chapter in the Collection Buiding Guide for more information.
-
- # -----------------------------------------------------------------
- # The IDX-CONFIG parameter defines the storage format used to
- # encode the word positions in the index. WCT (Word Count) format
- # is a compact format, storing the ordinal counting position of the
- # word from the beginning of the document. PSW (Paragraph, Sentence,
- # Word) format takes approximately 15-20% more disk space, but
- # stores semantically accurate paragraph and sentence boundaries.
- # Optionally, Many may be specified with either WCT or PSW to
- # improve the accuracy of the <MANY> operator at the expense of
- # diskspace and search performance.
-
- # This example enbles Word Count word position format (the default).
- $define IDX-CONFIG "WCT"
-
- # This example turns on Paragraph/Sentence/Word word position format.
- # It also enables the <MANY> operator accuracy improvement.
- #$define IDX-CONFIG "PSW Many"
-
- # -----------------------------------------------------------------
- # The IDXOPTS parameters define which index options are applied to
- # the various index token tables. The following index options are
- # supported for each: Stemdex enables an index by the stem of each
- # word. Casedex stores all case variants of a word separately, so
- # one can search for case sensitive terms such as "Jobs", "Apple",
- # and "NeXT" more easily. Soundex stores phonetic representations
- # of the word, using AT&T's standard soundex algorithm. The
- # application may also store 1-4 bytes of application-specific
- # data with each word instance, in the form of Location data and/or
- # Qualify Instance data. These options are specified separately
- # for each token table: word, zone, and zone attribute.
- $define WORD-IDXOPTS "Stemdex Casedex"
- $define ZONE-IDXOPTS ""
- $define ATTR-IDXOPTS "Casedex"
-
- # -----------------------------------------------------------------
- # Clustering is enabled by uncommenting the DOC-FEATURES line.
- # This stores a feature vector for each document in the
- # Documents table. These features are used for Clustering
- # results and fast Query-by-Example. See the discussions on
- # Clustering in the Collection Building Guide for more information.
- #$define DOC-FEATURES "TF"
-
- # -----------------------------------------------------------------
- # Document Summarization is enabled by uncommenting one of
- # the DOC-SUMMARIES lines below. The summarization data is
- # stored in the documents table so that it might easily be
- # shown when displaying the results of a search.
- # See the discussions on Document Summarization in the
- # Collection Building Guide for more information.
-
- # The example below stores the best three sentences of
- # the document, but not more than 500 bytes.
- #$define DOC-SUMMARIES "XS MaxSents 3 MaxBytes 500"
-
- # The example below stores the first four sentences of
- # the document, but not more than 500 bytes.
- #$define DOC-SUMMARIES "LS MaxSents 4 MaxBytes 500"
-
- # The example below stores the first 150 bytes of
- # the document, with whitespace compressed.
- #$define DOC-SUMMARIES "LB MaxBytes 150"
-
-
-
-
-